fix: prevent duplicate artifact sends across turns in WeCom channel#2728
Open
highland0971 wants to merge 1 commit intobytedance:mainfrom
Open
fix: prevent duplicate artifact sends across turns in WeCom channel#2728highland0971 wants to merge 1 commit intobytedance:mainfrom
highland0971 wants to merge 1 commit intobytedance:mainfrom
Conversation
# PR 2: Fix Duplicate Artifact File Sending Across Turns ## Problem When using the `present_files` tool to generate file artifacts, the system sends **all historically generated files** in the conversation on every new turn, rather than only newly-produced artifacts. Additionally, if DeerFlow or its Gateway restarts (new `thread_id` assigned to the same IM conversation), **all previously sent files are re-sent** again because the sent-artifact tracking was lost. ## Root Cause The original `_extract_artifacts` function was designed to scan the message history and extract file paths from `present_files` tool calls **after the last human message** only. However: 1. The original implementation had a subtle bug: it iterated messages in **reverse** and stopped at the first `type == "human"` message, but `present_files` tool calls in **subsequent turns** could be missed if the reverse scan logic was incorrect. 2. There was **no persistence** of already-sent artifacts — every turn re-extracted artifacts from the message history, which contained all past tool calls. 3. After a restart, the `ChannelStore`'s `thread_id` mapping changed but there was no cross-restart tracking of sent artifacts. ## Fix ### store.py — New sent-artifact persistence API - Added `get_sent_artifacts(channel_name, chat_id, topic_id=None)` and `set_sent_artifacts(channel_name, chat_id, paths, topic_id=None)` methods. - Keys are stored as `sent_artifacts:<channel>:<chat_id>[:<topic_id>]` in the existing JSON-backed `ChannelStore`, ensuring **survival across server restarts**. - Keyed by IM conversation identity (channel + chat), not by internal `thread_id`, so the record persists even when Gateway restarts assign a new internal thread. ### manager.py — New artifact diff logic - Replaced the reverse-scan approach with a **state-based extraction** from the `artifacts` state key (already normalized by `present_file_tool`). - Added `_diff_artifacts()`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR 2: Fix Duplicate Artifact File Sending Across Turns
Problem
When using the
present_filestool to generate file artifacts, the systemsends all historically generated files in the conversation on every new
turn, rather than only newly-produced artifacts.
Additionally, if DeerFlow or its Gateway restarts (new
thread_idassignedto the same IM conversation), all previously sent files are re-sent again
because the sent-artifact tracking was lost.
Root Cause
The original
_extract_artifactsfunction was designed to scan the messagehistory and extract file paths from
present_filestool calls after thelast human message only. However:
reverse and stopped at the first
type == "human"message, butpresent_filestool calls in subsequent turns could be missed ifthe reverse scan logic was incorrect.
re-extracted artifacts from the message history, which contained all
past tool calls.
ChannelStore'sthread_idmapping changed butthere was no cross-restart tracking of sent artifacts.
Fix
store.py — New sent-artifact persistence API
get_sent_artifacts(channel_name, chat_id, topic_id=None)andset_sent_artifacts(channel_name, chat_id, paths, topic_id=None)methods.sent_artifacts:<channel>:<chat_id>[:<topic_id>]inthe existing JSON-backed
ChannelStore, ensuring survival across serverrestarts.
thread_id, so the record persists even when Gateway restarts assign anew internal thread.
manager.py — New artifact diff logic
the
artifactsstate key (already normalized bypresent_file_tool)._diff_artifacts()function that compares extracted artifact pathsagainst the store's
sent_artifactsrecord for that conversation, returningonly newly-produced artifacts.
_handle_chatand_handle_streaming_chatto passchannel_name,chat_id,topic_id, andstoreto the new diff logic.Usage Flow
thread_id, but sent_artifacts lookup useschannel:chat_id→ previous records found → no duplicates sent.Testing
sent on subsequent turns.
records were correctly loaded from
store.json.Files Changed
backend/app/channels/store.pybackend/app/channels/manager.py